Inference for similarity indices
نویسندگان
چکیده
Clustering methods are considered as unsupervised learning techniques as there are no predetermined subpopulations. Evaluating the results of clustering algorithms is the main topic of cluster validity. This paper tries to contribute to a better understanding and inference for the distribution, of such a case, of the Rand Index (Rand, 1971) under several conditions. Besides this, a bootstrapping test for testing the significance of observed value of the similarity indices is provided and compared with the p-values yielded from the density function of the Rand index. The comparison is made using simulated data. Let nij be the number of objects which belong to cluster i of partition P1 and cluster j of partition P2, i = 1, · · · , R, j = 1, · · · , C, with ni. = ∑C j nij , n.j = ∑R i nij , the size of the clusters in P1 , P2 respectively, and n = ∑R i ni. = ∑C j n.j, the total number of observations. After that the Rand index can be applied to this result; this index is given by
منابع مشابه
Adaptive Network-based Fuzzy Inference System-Genetic Algorithm Models for Prediction Groundwater Quality Indices: a GIS-based Analysis
The prediction of groundwater quality is very important for the management of water resources and environmental activities. The present study has integrated a number of methods such as Geographic Information Systems (GIS) and Artificial Intelligence (AI) methodologies to predict groundwater quality in Kerman plain (including HCO3-, concentrations and Electrical Conductivity (EC) of groundwater)...
متن کاملNew Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System
This study aimed at providing a systematic method to analyze the characteristics of customers’ purchasing behavior in order to improve the performance of customer relationship management system. For this purpose, the improved model of LRFM (including Length, Recency, Frequency, and Monetary indices) was utilized which is now a more common model than the basic RFM model apt for analyzing the cus...
متن کاملRole of nematodes in soil health and their use as indicators.
The composition of nematode communities (plant-parasitic and free-living) may be used as bioindicators of soil health or condition because composition correlates well with nitrogen cycling and decomposition, two critical ecological processes in soil. Maturity and trophic diversity indices withstand statistical rigor better than do abundances, proportions, or ratios of trophic groups. Maturity i...
متن کاملROBUSTNESS OF THE TRIPLE IMPLICATION INFERENCE METHOD BASED ON THE WEIGHTED LOGIC METRIC
This paper focuses on the robustness problem of full implication triple implication inference method for fuzzy reasoning. First of all, based on strong regular implication, the weighted logic metric for measuring distance between two fuzzy sets is proposed. Besides, under this metric, some robustness results of the triple implication method are obtained, which demonstrates that the triple impli...
متن کاملA new similarity measure based on Bayesian Network signature correspondence for brain tumors cases retrieval
Case retrieval constitutes an interesting area of research which contributes to the evolution of several domains. The similarity measure module is a fundamental step in the retrieval process which affects remarkably on a retrieval system. In this context, we suggest in this paper a similarity measure applied to brain tumor cases retrieval. The rationale behind the proposed measure consists in q...
متن کامل